Skunkware 5

home *** CD-ROM | disk | FTP | other *** search

/ Skunkware 5 / Skunkware 5.iso / lib / linuxdoc-sgml / doc / guide.txt < prev next >

Wrap

Text File | 1994-06-21 | 30KB | 859 lines

Linuxdoc-SGML User's Guide Matt Welsh, mdw@sunsite.unc.edu v1.3, 7 June 1994 This document is a user's guide to the linuxdoc-sgml formatting sys- tem, an SGML-based text formatter which allows you to produce LaTeX, plain ASCII, and HTML from a single source format. This guide docu- ments Linuxdoc-SGML version 1.1. 1. Introduction This is a user's guide to the linuxdoc-sgml document processing system, for use with Linux documentation. linuxdoc-sgml is an SGML DTD (Document Type Definition) and set of ``replacement files'' which convert the SGML to groff, LaTeX, and HTML source. In the future, linuxdoc-sgml will support texinfo, as well as other formats. linuxdoc-sgml is based heavily on the QWERTZ DTD by Tom Gordon, thomas.gordon@gmd.de. I have only made revisions to his DTD and replacement files for use by Linux documentation. linuxdoc-sgml is not meant to be a general document-processing system. Although it can be used for documents of many types, I have tailored it for use by the Linux documentors in producing HOWTOs, FAQs, and (later) the Linux Documentation Project manuals. Therefore, I have tweaked features into and out of the system for this purpose. If you see a lack of generality in the system, that is the reason. There's nothing binding linuxdoc-sgml to Linux documentation, but all documents produced by the system will look a certain way. If you want things to look differently I suggest that you use a more generalized system such as the plain QWERTZ DTD. One of the goals of this system is to make documents easy to produce in numerous formats. Until now, most Linux documentation has been produced in plain ASCII through manual editing. A system like groff can take care of the plain-text formatting, but that still doesn't give you HTML (for use on the World Wide Web), LaTeX (for nicely printed documents), or texinfo. Therefore, if there are features missing from this system that you would like, please let me know! The idea is that we shouldn't have to use a lot of hackery to produce good-looking docs in multiple formats. The author should have to do as little as possible. 1.1. About this document This document is written using the linuxdoc-sgml DTD. It contains more or less everything you need to know to write SGML docs with this DTD. See example.sgml for an example of an SGML document that you can use as a model for your own docs. 1.2. Why SGML? I chose SGML for this system because SGML is made specifically for translation to other formats. SGML, which stands for Standard Generalized Markup Language, allows you to specify the structure of a document---that is, what kinds of things make up the document. You specify the structure of a document with a DTD (Document Type Definition). linuxdoc-sgml is one DTD that specifies the structure for Linux HOWTOs and other docs. QWERTZ is another DTD; the SGML standard provides DTD's for books, articles, and other generic document types. The DTD specifies the names of ``elements'' within the document. An element is just a bit of structure---like a section, a subsection, a paragraph, or even something smaller like emphasised text. Unlike LaTeX, however, these elements are not in any way intrinsic to SGML itself. The linuxdoc-sgml DTD happens to define elements that look a lot like their LaTeX counterparts---you have sections, subsections, verbatim ``environments'', and so forth. However, using SGML you can define any kind of structure for the document that you like. In a way, SGML is like low-level TeX, while the linuxdoc-sgml DTD is like LaTeX. Don't be confused by this analogy. SGML is not a text-formatting system. There is no ``SGML formatter'' per se. SGML source is only converted to other formats for processing. Furthermore, SGML itself is used only to specify the document structure. There are no text- formatting facilities or ``macros'' intrinsic to SGML itself. All of those things are defined within the DTD. You can't use SGML without a DTD---a DTD defines what SGML does. 1.3. How it works Here's how processing a document with SGML and the linuxdoc-sgml DTD works. First, you need a DTD. I'm using the QWERTZ DTD which was produced, originally, by a group of people who needed a LaTeX-like DTD. I've modified the QWERTZ DTD to produce the linuxdoc-sgml DTD for our purposes. The DTD simply sets up the structure of the document. A small portion of it looks like this: <!element article - - (titlepag, header?, toc?, lof?, lot?, p*, sect*, (appendix, sect+)?, biblio?) +(footnote)> This part sets up the overall structure for an ``article'', which is like a ``documentstyle'' within LaTeX. The article consists of a titlepage (titlepag), an optional header (header), an optional table of contents (toc), optional lists of figures (lof) and tables (lot), any number of paragraphs (p), any number of top-level sections (sect), optional appendices (appendix), an optional bibliography (biblio) and footnotes (footnote). As you can see, the DTD doesn't say anything about how the document should be formatted or what it should look like. It just defines what parts make up the document. Elsewhere in the DTD the structure of the titlepag, header, sect, and other elements are defined. You don't need to know anything about the syntax of the DTD in order to write documents. I'm just presenting it so you know what it looks like and what it does. You do need to be familiar with the document structure that the DTD defines. If not, you might violate the structure when attempting to write a document, and be very confused about the resulting error messages. We'll describe the structure of linuxdoc-sgml documents in detail later. The next step is to write a document using the structure defined by the DTD. Again, the linuxdoc-sgml DTD makes documents look a lot like LaTeX---it's very easy to follow. In SGML jargon a single document written using a particular DTD is known as an ``instance'' of that DTD. In order to translate the SGML source into another format (such as LaTeX or nroff) for processing, the SGML source (the document that you wrote) is parsed along with the DTD by (you guessed it) the SGML parser. I'm using the sgmls parser by James Clark, jjc@jclark.com, who also happens to be the author of groff. We're in good hands. The parser (the executable sgmls simply picks through your document and verifies that it follows the structure set forth by the DTD. It also spits out a more explicit form of your document, with all ``macros'' and elements expanded, which is understood by sgmlsasp, the next part of the process. sgmlsasp is responsible for converting the output of sgmls to another format (such as LaTeX). It does this using replacement files, which describe how to convert elements in the original SGML document into corresponding source in the ``target'' format (such as LaTeX or nroff). For example, part of the replacement file for LaTeX looks like: <itemize> + "\\begin{itemize}" + </itemize> + "\\end{itemize}" + Which says that whenever you begin an itemize element in the SGML source, it should be replaced with \begin{itemize} in the LaTeX source. (As I said, elements in the linuxdoc-sgml DTD are very similar to their LaTeX counterparts). So, to convert the SGML to another format, all you have to do is write a new replacement file for that format that gives the appropriate analogues to the SGML elements in that new format. In practice, it's not that simple---for example, if you're trying to convert to a format that isn't structured at all like your DTD, you're going to have trouble. In any case, it's much easier to do than writing individual parsers and translators for many kinds of output formats; SGML provides a generalized system for converting one source to many formats. Once sgmlsasp has completed its work, you have LaTeX source which corresponds to your original SGML document, which you can format using LaTeX as you normally would. Later in this document I'll give examples and show the commands used to do the translation and formatting. You can do this all on one command line. But first, I should describe how to install and configure the software. 2. Installation The file linuxdoc-sgml.tar.gz contains everything that you need to write SGML documents and convert them to LaTeX, nroff, and HTML. In addition to this package, you will need one or both of the following: 1. groff. You need version 1.08 or 1.09. Apparently some of the margin-handling in groff is in a state of flux from version to version; they both work, but you get slightly different results. (Particularly, with 1.09 the left margin isn't indented two characters as it is in 1.08. There is a way around it, but it looks terrible on 1.08. Versions previous to 1.08 will not work. You can get this from prep.ai.mit.edu in /pub/gnu. There is a Linux binary version on sunsite as well. You will need groff to produce plain ASCII from your SGML docs. (TeX/LaTeX will be used to produce nicely-printed PostScript and .dvi). 2. TeX and LaTeX. This is available more or less everywhere; you should have no problem getting it and installing it (there is a Linux binary distribution on sunsite). Of course, you only need TeX/LaTeX if you want to format your SGML docs with LaTeX. So, installing TeX/LaTeX is optional. See the section on the Linux HOWTO project below for how we'll manage this vis-a-vis the Linux HOWTOs. 3. If you want to view the generated HTML, I suggest getting NCSA Mosaic 2.2 or later. Neither of these are required by the SGML system, but I suggest that you get one or the other in order to format your docs and verify that they look all right before distributing them. 2.1. Installing the software The steps needed to install and configure the linuxdoc-sgml stuff are as follows: 1. First, unpack the tar file linuxdoc-sgml.tar.gz somewhere. This will create the directory linuxdoc-sgml where all of the SGML files live. It doesn't matter where you unpack this file; just don't move things around within the linuxdoc-sgml directory. 2. Next, you need to compile the sgmls parser. In the linuxdoc- sgml/sgmls-1.1 directory, issue the commands: $ make config.h $ make $ make install $ make install.man This should compile the parser and translator, and place the binaries sgmls, sgmlsasp, and rast in linuxdoc-sgml/bin. I suggest that you don't move those binaries from that location; instead, make symlinks to them from /usr/local/bin or place linuxdoc-sgml/bin on your path. (If you move things around within the linuxdoc-sgml tree you'll have to edit a number of files to get everything to cooperate again. Best to leave things as-is.) If things don't work try editing the Makefile in the sgmls-1.1 direc- tory. I have it set to use gcc as the compiler, and use rather malig- nant options. Compiles fine on Linux and sun-4 systems. This will also install man pages for the three binaries in linuxdoc- sgml/man. You can move those or link them to your regular man page tree, should you need them. 3. Edit the variables at the top of the scripts format, qroff, preroff, prehtml, and qtex in linuxdoc-sgml/bin. All you really need to edit is the value of the LINUXDOC shell variable which gives the full pathname of the linuxdoc-sgml directory. 4. In the html-fix directory, issue the commands: $ make $ make install This will build fixref and html2html, which are post processors for the HTML conversion, and place them in the bin directory. If all went well, you should be ready to use the system. Just be sure that linuxdoc-sgml/bin is on your path or you've linked the files therein to your standard binary directories. Again, don't just copy them somewhere else; the scripts expect to find each other in that directory. 2.2. Testing it out You can now test the system. The format script takes an SGML document as input and translates it to a given format. The qtex script will process the output of format using LaTeX, and qroff will process it using nroff. Let's say you have the SGML document foo.sgml. You can translate it to LaTeX, and produce PostScript output (via dvips) with the command: $ format -Tlatex foo | qtex > foo.ps Or, you can produce a DVI file using the -d switch with qtex, as so: $ format -Tlatex foo | qtex -d > foo.dvi If you want to produce plain ASCII, through groff, use the command: $ format -Tnroff foo | qroff > foo.txt Note that I have tailored the groff conversion for plain ASCII output. (That is, I've removed page headers, page numbers, changed the mar- gins, and so on.) With some hacking you can produce PostScript and DVI from the groff resulting from format, but I suggest that you use LaTeX for that instead. If you want to produce HTML, the procedure is a bit more complicated, because of cross-references. Here's an example: $ format -Thtml foo.sgml | prehtml | fixref > tmp.html $ format -Thtml foo.sgml | prehtml >> tmp.html $ cat tmp.html | html2html foo > foo.html $ rm tmp.html This will produce foo.html, as well as foo-1.html, foo-2.html, and so on---one file for each section of the document. Run your WWW client on foo.html, which is the toplevel file. Also make sure that all of the HTML files corresponding to your document are in one directory, as they reference each other with local URLs. A good way to test this would be to run it on this file, guide.sgml. If you just want to capture your errors from the SGML conversion, use something like $ format -Tnroff foo > /dev/null 2.3. Development note The HTML conversion is, at this time, rudimentary but adequate. In the future there will be support for cross-references, navigation buttons, external URLs, and the like. Something is better than nothing. :) Also, if you'd like to help me implement a texinfo (or plain Info) conversion for Linuxdoc-SGML, let me know! As with HTML we'll have to do some pre- and post-processing (which you supposedly shouldn't need with SGML, ah well), but that's not a big issue. 3. Writing Documents with linuxdoc-sgml For the most part, writing documents using the linuxdoc DTD is very simple, and somewhat like LaTeX. However, there are some caveats to watch out for. In this section I'll give an introduction on writing SGML docs. See the file example.sgml for an SGML example document (and tutorial) which you can use as a model when writing your own docs. Here I'm just going to discuss the various features of SGML, but the source is not very readable as an example. Instead, print out the source (as well as the formatted output) for example.sgml so you have a real live case to refer to. 3.1. Basic concepts Looking at the source of the example document, you'll notice right off that there are a number of ``tags'' marked within angle brackets (< and >). A tag simply specifies the beginning or end of an element, where an element is something like a section, a paragraph, a phrase of italicized text, an item in a list, and so on. Using a tag is like using a LaTeX command such as \item or \section{...}. As a simple example, to produce this boldfaced text, I typed As a simple example, to produce <bf>this boldfaced text</bf>, ... in the source. <bf> begins the region of bold text, and </bf> ends it. Alternately, use can use the abbreviated form As a simple example, to produce <bf/this boldfaced text/, ... which encloses the bold text within slashes. (Of course, you'll need to use the long form if the enclosed text contains slashes, such as the case with UNIX filenames). There are other things to watch out with respect to special characters (that's why you'll notice all of these bizarre-looking ampersand expressions if you look at the source; I'll talk about those shortly). In some cases, the end-tag for a particular element is optional. For example, to begin a section, you use the <sect> tag, however, the end- tag for the section (which could appear at the end of the section body itself, not just after the name of the section!) is optional and implied when you start another section of the same depth. In general you needn't worry about these details; just follow the model used in the tutorial (example.sgml), and feel free to ask me if you have any questions about the particulars. 3.2. Special characters Obviously, the angle brackets are themselves special characters in the SGML source. There are others to watch out for. For example, let's say that you wanted to type an expression with angle brackets around it, as so: <foo>. In order to get the left angle bracket, you must use the < element, which is a ``macro'' that expands to the actual left- bracket character. Therefore, in the source, I typed angle brackets around it, as so: <tt><foo></tt>. Generally, something beginning with an ampersand is a special macro. For example, there's &percnt to produce %, &verbar to produce |, and so on. For all ``special characters'' there exist these ampersanded- entities to represent them. Usually, you don't need to use the ampersand macro to get a special character, however, in some cases it is necessary. The most commonly used are: o Use & for the ampersand (&), o Use < for a left bracket (<), o Use > for a right bracket (>), o Use &etago; for a left bracket with a slash (</) o Use $ for a dollar sign ($), o Use # for a hash (#), o Use % for a percent (%), o Use `` and '' for quotes, or use &dquot for ". 3.3. Verbatim and code environments While we're on the subject of special characters, I might as well mention the verbatim ``environment'' used for including literal text in the output (with spaces and indentation preserved, and so on). The verb element is used for this; it looks like the following: <verb> Some literal text to include as example output. </verb> The verb environment doesn't allow you to use everything within it literally. Specifically, you must do the following within verb envi- ronments. o Use &ero; to get an ampersand, o Use &etago; to get </, o Don't use \end{verbatim} within a verb environment, as this is what LaTeX uses to end the verbatim environment. (In the future, it should be possible to hide the underlying text formatter entirely, but the parser doesn't support this feature yet.) The code environment is much just like the verb environment, except that horizontal rules are added to the surrounding text, as so: ___________________________________________________________________ Here is an example code environment. ___________________________________________________________________ You should use the tscreen environment around any verb environments, as so: <tscreen><verb> Here is some example text. </verb></tscreen> tscreen is an envionment that simply indents the text and sets the sets the default font to tt. This makes examples look much nicer, both in the LaTeX and plain ASCII versions. You can use tscreen without verb, however, if you use any special characters in your example you'll need to use both of them. tscreen does nothing to special char- acters. See example.sgml for examples. The quote environment is like tscreen, except that it does not set the default font to tt. So, you can use quote for non-computer-interaction quotes, as in: <quote> Here is some text to be indented, as in a quote. </quote> which will generate: Here is some text to be indented, as in a quote. 3.4. Overall document structure Before we get too in-depth with details, I'm going to describe the overall structure of a document as defined by the linuxdoc DTD. Look at example.sgml for a good example of how a document is set up. 3.4.1. The preamble In the document ``preamble'' you set up things such as the title information and document style. For a Linux HOWTO document this should look like: <!doctype linuxdoc system> <article> <title>The Linux Food-Processing HOWTO <author>Norbert Ebersol, <tt/norbert@foo.com/ <date>v1.0, 9 March 1994 <abstract> This document describes how to connect your Linux machine to a food-processor for dicing vegetables. </abstract> <toc> The elements should go more or less in this order. The first line tells the SGML parser to use the linuxdoc DTD. The <article> tag forces the document to use the ``article'' document style. (The original QWERTZ DTD defines ``report'' and ``book'' as well; I haven't tweaked these for use with linuxdoc-sgml. Just use article for you SGML docs, for now.) The title, author, and date tags should be obvious; in the date tag include the version number and last modification time of the document. Thr abstract tag sets up the text to be printed at the top of the document, before the table of contents. If you're not going to include a table of contents (the toc tag), you probably don't need an abstract. I suggest that all Linux HOWTOs use this same format for the preamble, so that the title, abstract, and table of contents are all there and look the same. 3.4.2. Sectioning and paragraphs After the preamble, you're ready to dive into the document. The following sectioning commands are available: o sect: For top-level sections (i.e. 1, 2, and so on.) o sect1: For second-level subsections (i.e. 1.1, 1.2, and so on.) o sect2: For third-level subsubsections. o sect3: For fourth-level subsubsubsections. o sect4: For fifth-level subsubsubsubsections. These are roughly equivalent to their LaTeX counterparts section, subsection, and so on. After the sect (or sect1, sect2, etc.) tag comes the name of the section. For example, at the top of this document, after the preamble, comes the tag: <sect>Introduction And at the beginning of this section (Sectioning and paragraphs), there is the tag: <sect2>Sectioning and paragraphs After the section tag, you begin the body of the section. However, you must start the body with a <p> tag, as so: <sect>Introduction <p> This is a user's guide to the <tt/linuxdoc-sgml/ document processing... This is to tell the parser that you're done with the section title and are ready to begin the body. Thereafter, new paragraphs are started with a blank line (just as you would do in TeX). For example, Here is the end of the first paragraph. And we start a new paragraph here. There is no reason to use <p> tags at the beginning of every para- graph; only at the beginning of the first paragraph after a sectioning command. 3.4.3. Ending the document At the end of the document, you must use the tag: </article> to tell the parser that you're done with the article element (which embodies the entire document). 3.5. Cross-references Now we're going to move onto other features of the system. Cross- references are easy. For example, if you want to make a cross- reference to a certain section, you need to label that section as so: <sect1><heading><label id="sec-intro">Introduction</> You can then refer to that section somewhere in the text using the expression: See section <ref id="sec-intro" name="Introduction"> for an introduction. This will replace the ref tag with the section number labelled as sec- intro. The name argument to ref is necessary for nroff and HTML trans- lations (at the moment). The nroff macro set used by Linuxdoc-SGML does not currently support cross-references, and it's often nice to refer to a section by name instead of number. For example, this section is ``Cross-references''. There is also a url element for Universal Resource Locators, or URLs, used on the World Wide Web. This element should be used to refer to other documents, files available for FTP, and so forth. For example, You can get the Linux HOWTO documents from <url url="http://sunsite.unc.edu/mdw/linux.html" name="the Linux Documentation Project home page">. The url argument specifies the actual URL itself. A link to the URL in question will be automatically added to the HTML document. The optional name argument specifies the text that should be anchored to the URL (for HTML conversion) or named as the description of the URL (for LaTeX and nroff). If no name argument is given, the URL itself will be used. For example, you can get the Linuxdoc-SGML package from (ftp://ftp.cs.cornell.edu/mdw/linuxdoc-sgml-1.1.tar.gz). 3.6. Fonts Essentially, the same fonts supported by LaTeX are supported by linuxdoc-sgml. Note, however, that the conversion to plain ASCII (through groff) does away with the font information---I might hack up plain-ASCII representations of the various fonts if the need arises. So, you should use fonts as much as possible, for the benefit of the conversion to LaTeX. But don't depend on the fonts to get a point across in the plain ASCII version. In particular, the tt tag described above can be used to get constant- width ``typewriter'' font which should be used for all e-mail addresses, machine names, filenames, and so on. Example: Here is some <tt>typewriter text</tt> to be included in the document. Equivalently: Here is some <tt/typewriter text/ to be included in the document. Remember that you can only use this abbreviated form if the enclosed text doesn't contain slashes. Other fonts can be achieved with bf for boldface and em for italics. Several other fonts are supported as well, but I don't suggest you use them, because we'll be converting these documents to other formats such as HTML which may not support them. Boldface, typewriter, and italics should be all that you need. 3.7. Lists There are various kinds of supported lists. They are: o itemize for bulleted lists such as this one. o enum for numbered lists. o descrip for ``descriptive'' lists. Each item in an itemize or enum list must be marked with an item tag. Items in a descrip are marked with tag. For example, <itemize> <item>Here is an item. <item>Here is a second item. </itemize> Looks like this: o Here is an item. o Here is a second item. Or, for an enum, <enum> <item>Here is the first item. <item>Here is the second item. </enum> You get the idea. Lists can be nested as well; see the example docu- ment for details. A descrip list is slightly different, and slightly ugly, but you might want to use it for some situations: <descrip> <tag/Gnats./ Annoying little bugs that fly into your cooling fan. <tag/Gnus./ Annoying little bugs that run on your CPU. </descrip> ends up looking like: Gnats. Annoying little bugs that fly into your cooling fan. Gnus. Annoying little bugs that run on your CPU. 3.8. Miscellany There are various other esoteric features in the system as well, most of which you probably won't use. If you're curious, read the QWERTZ User's Guide (from ftp.cs.cornell.edu in pub/mdw/SGML). QWERTZ (and hence, linuxdoc) supports many features such as mathematical formulae, tables, figures, and so forth. I don't recommend using most of these features in the Linux HOWTOs because they won't render well in plain ASCII. If you'd like to write general documentation in SGML, I suggest using the original QWERTZ DTD instead of the hacked-up linuxdoc DTD, which I've modified for use particularly by the Linux HOWTOs and other documentation. The bottom line is, linuxdoc-sgml supports many other features found in the QWERTZ DTD, but I haven't necessarily tweaked them to work well with linuxdoc-sgml. If you encounter problems with any of them, please let me know. 4. The Linux HOWTO project How does this tie into writing HOWTOs? First of all, I'd like to see everyone eventually convert their HOWTOs to SGML using this DTD. This has a number of advantages. First of all, it will allow you to just send me the SGML source, which I'll convert to plain ASCII, TeX, whatever, for posting and archiving. Also, it will give the HOWTOs a common look and feel; any changes that I make to the DTD will be reflected in all of the HOWTOs. I have set up the linuxdoc DTD to have a certain look and feel. If you want your document to look differently, please let me know, because I'll need to make those changes in the DTD itself. That is, do not modify your version of the DTD or replacement files to get other features in the system. We all must use the same DTD and replacement files or this whole system will break down. If you find bugs in it, or have suggestions for how we can change thing or add/modify features, let me know. I'll be more than happy to accomodate you.